Dealing with Missing Data

نویسندگان

  • S. Greco
  • B. Matarazzo
  • R. Slowinski
چکیده

Rough sets methodology is a useful tool for analysis of decision problems concerning a set of objects described in a data table by a set of condition attributes and by a set of decision attributes. In practical applications, however, the data table is often not complete because some data are missing. To deal with this case, we propose an extension of the rough set methodology to the analysis of incomplete data tables. The adaptation concerns both the classical rough set approach based on the use of indiscernibility relations and the new rough set approach based on the use of dominance relations. While the first approach deals with the multi-attribute classification problem, the second approach deals with the multi-criteria sorting problem. In the later, condition attributes have preferenceordered scales, and thus are called criteria, and the classes defined by the decision attributes are also preference-ordered. The adapted relations of indiscernibility or dominance between a pair of objects are considered as directional statements where a subject is compared to a referent object. We require that the referent object has no missing data. The two adapted rough set approaches boil down to the original approaches when there are no missing data. The rules induced from the newly defined rough approximations defined are either exact or approximate, depending whether they are supported by consistent objects or not, and they are robust in a sense that each rule is supported by at least one object with no missing data on the condition attributes or criteria represented in the rule.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

چند رویکرد برخورد با مقادیر گمشده‌ متغیرهای کمی و بررسی اثر آنها بر نتایج حاصل از یک کارآزمایی‌ بالینی

Background and Objectives: A major challenge that affects the longitudinal studies is the problem of missing data. Missing in the data may result in the loss of part of the information which reduces the accuracy of the estimator and obtain the results will be biased and inaccurate. Therefore, it is necessary to evaluate the missing data mechanism from a longitudinal research and to consider thi...

متن کامل

Probabilistic Linkage of Persian Record with Missing Data

Extended Abstract. When the comprehensive information about a topic is scattered among two or more data sets, using only one of those data sets would lead to information loss available in other data sets. Hence, it is necessary to integrate scattered information to a comprehensive unique data set. On the other hand, sometimes we are interested in recognition of duplications in a data set. The i...

متن کامل

تحلیل درستنمایی ماکزیمم مدل رگرسیون لجستیک در حالتی که داده های متغیرهای پیشگو کامل نیستند ولی متغیرهای کمکی وجود دارند

Background and Objectives: Missing data exist in many studies, e.g. in regression models, and they decrease the model's efficacy. Many methods have been suggested for handling incomplete data: they have generally focused on missing outcome values. But covariate values can also be missing.Materials and Methods: In this paper we study the missing imputation by the EM algorithm and auxiliary varia...

متن کامل

Dealing with missing data in a multi-question depression scale: a comparison of imputation methods

BACKGROUND Missing data present a challenge to many research projects. The problem is often pronounced in studies utilizing self-report scales, and literature addressing different strategies for dealing with missing data in such circumstances is scarce. The objective of this study was to compare six different imputation techniques for dealing with missing data in the Zung Self-reported Depressi...

متن کامل

Frequency Ratio: a method for dealing with missing values within nearest neighbour search

In this paper we introduce the Frequency Ratio (FR) method for dealing with missing values within nearest neighbour search. We test the FR method on known medical datasets from the UCI machine learning repository. We compare the accuracy of the FR method with five commonly used methods (three “imputation” and two “bypassing” methods) for dealing with values that are “missing completely at rando...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002